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Abstract 

In this paper, we propose a novel approach to modeling nonstationary spa- 
tial fields. The proposed method works by expanding the geographic plane 
over which these processes evolve into higher dimensional spaces, transforming 
and clarifying complex patterns in the physical plane. By combining aspects 
of multi-dimensional scaling, group lasso, and latent variable models, a dimen- 
sionally sparse projection is found in which the originally nonstationary field 
exhibits stationarity. Following a comparison with existing methods in a simu- 
lated environment, dimension expansion is studied on a classic test-bed data set 
historically used to study nonstationary models. Following this, we explore the 
use of dimension expansion in modeling air pollution in the United Kingdom, a 
process known to be strongly influenced by rural/urban effects, amongst others, 
which gives rise to a nonstationary field. 



1 Introduction 

Recently there has been great interest in using spatial statistical methods to 
model environmental processes, with the aim of both gaining an improved un- 
derstanding of underlying processes and making predictions at locations where 
measurements of a process are not available. The majority of such methods 



make the assumption that the underlying process is stationary (Cressie (1993)) 
which, for many environmental systems, may be untenable. 

In this paper, we focus on accurately explicating the nonstationary structure 
that often arises in measurements of atmospheric, agricultural, and other envi- 
ronmental systems. If these systems share one underlying theme it is complex 
spatial structures, being influenced by such features as topography, weather, 
and other environmental factors. For example, the air quality characteristics of 
cities are likely to be more similar than that of rural areas irrespective of their 
geographic proximity. Ideally we might model these effects directly; however, 



information on the underlying causes is often not routinely available. Hence 
when modeling environmental systems there exists a need for a class of models 
that are more complex than those which rely on the assumption of stationarity. 

In the field of atmospheric science, empirical orthogonal functions have been 
used to model a nonstationary process as the sum of a stationary isotropic pro- 
cess and a set of basis functions with random coefficients representing departures 
from nonstationarity ( Nychka and Saltzman ( 1998 ) , Nychka et al. ( 2002 ) ) . Cur- 
rent approaches to modeling nonstationary processes in the statistical literature 
broadly comprise those that (i) combine locally stationary processes to create 
an overall nonstationary process and (ii) 'image warping'. 

A number of approaches for handling nonstationarity assume that over small 
enough spatial domains, the effects of nonstationarity are negligible, and hence 
locally stationary models may be used. This concept is the basis of kernel 
approaches, early examples of which can be found in [Baas] (|1990a|b[ ). The 



process-convolution approach (Higdon (1998), Higdon et al. (1999)) relies on 



the notion that a wide range of stationary Gaussian processes may be expressed 
as a kernel convolved with a Gaussian white noise process, with the kernel being 
allowed to vary spatially to account for nonstationarity. The form of the kernel 
allows for a broad expression of potential covariance functions, with a Gaussian 
kernel corresponding to a Gaussian covariance function and other choices of ker- 
nel resulting in other correlation structures. Similarly, Fuentes (2001) suggested 
modeling the field as a weighted average of local stationary processes within a 
set of regions, an idea which was later extended to include a continuous convo- 
lution of stationary processes (Fuentes and Smith (2007)). Various difficulties 
still remain in this class of models, including the lack of a complete and easily 
interpretable global model and the choice of local regions and details of the 
weight structures. An alternative approach proposed by Sampson and Guttorp| 
( 1992 ) is that of "image warping" , the central idea of which is that a nonstation- 
ary process may be stationary in a deformed, or warped, version of geographic 
space. Multi-dimensional scaling (or related methods) can be used to find the 
deformed locations with a mapping between the original and deformed space 
found using, for example, a thin plate spline. 

The principal idea underlying the proposed method is that of embedding the 
original field in a space of higher dimension where it can be more straightfor- 
wardly described and modelled. Specifically, we shift the dimensionality of the 
problem from 2 or 3 dimensions to 4, 5, or more in order to recover stationarity 
in the process; we term our methodology "dimension expansion." Our starting 
point is that nonstationary systems may be represented as low-dimensional pro- 
jections of high-dimensional stationary systems (see, for example, Perrin and 



Meiring (2003)). The method is superficially similar to that of image warping; 



however, it differs fundamentally in that here the locations in the geographic 
space are retained, with added flexibility obtained through the extra dimen- 
sions. Additionally, it addresses one of the major issues with the image warping 
approach, namely folding of the space. This occurs in image warping if the esti- 
mate of the function that transforms from geographical to deformed space is not 
injective. As a result of folding, two geographically distinct locations may be 
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mapped to the same location, meaning the variation between them will be incor- 
rectly treated as measurement error and small scale variation (i.e. the nugget), 
which is expressly appropriate only for collocated and other proximal monitor- 
ing sites. In such cases, mapping quantities such as prediction intervals becomes 
particularly challenging both in terms of implementation and interpretation. 

The remainder of the paper is organised as follows: Section 2 introduces 
the dimension expansion framework proposed here, including an illustrative ex- 
ample to demonstrate the fundamental concepts behind the approach. This 
example is then used to draw comparisons to image warping. In Section 3, di- 
mension expansion is applied to two real life examples. First, the solar radiation 
dataset originally used in Sampson and Guttorp ( 1992 ) and used as a test-bed 
in various more recent image warping papers is studied. Second, we study 
air pollution from seventy-seven monitoring locations in the United Kingdom 
which show clear signs of nonstationarity. We highlight the ability of dimension 
expansion to accurately model such data as measured through cross-validated 
prediction error. Finally, Section 4 provides a discussion and suggestions for 
future developments. 



2 Dimension Expansion 



While early work ( Cressie| ([1993 )) dealt primarily with stationary models, it is 



now generally recognized that many spatial processes {Y(x) : x e S}, (S £ 
7l d ) fail to satisfy this assumption. Environmental systems might exhibit be- 
haviour that looks locally stationary, yet when considered over large and het- 
erogenous domains they very often exhibit nonstationarity. For ease of notation, 
we consider Y{x) to be a (potentially nonstationary) mean-zero Gaussian pro- 
cess and place our emphasis on representing the nonstationary structure. 

A principal task in spatial statistics is estimating a variogram model (or 
correlation function) to explain spatial dependencies. The theoretical variogram, 
defined as 

2 7 ( Xi , x j ) = E(\Y(x i )-Y(x j )\ 2 ) 

is typically modeled using a parametric stationary variogram 70(h) depending 
only on h = Xi — Xj, the difference vector between locations, and the parame- 
ter^) cf>. If the field is nonstationary, such a model will be a misspecification. In 
response, we transform the set of observed spatial locations S € lZ d into one of 
higher dimension S' £ lZ d+p , where p > and S is a subset of the dimensions of 
S' . Put plainly, such an approach amounts to allowing extra dimensions for the 
observed locations x±, . . . , x s , notated as zi, . . . , z s such that the field l^([a;, z}) 
is stationary with a variogram model ^^([xijZi] — [xj,Zj]). Here [x,z] is the 
concatenation of the dimensions x and z. 



Perrin and Meiring ( 2003 ) explore this idea in the particular case where both 
the covariance function and the expansion from a; to [x, z] are known. In their 
motivating example, they consider the following stationary covariance on the 
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plane: 



cov(Y([xi, Zi]),Y([xj, zj])) =exp(-\xi -Xj 



By restricting to the set z = x 2 and defining Y'(x) = Y([x,x 2 ]), the resulting 
covariance function on this reduced-dimension field is nonstationary, namely 



cov(Y'(xi), Y'(xj)) = exp(— |o 



) exp(l 



Perrin and Meiring ( 2003[ ) then consider the reverse problem, proving that a 
nonstationary random field indexed by 7Z d (with moments of order greater than 
2) can always be represented as second-order stationary in lZ 2d . It is not, how- 
ever, necessary to move from TZ d to TZ 2d to obtain the existence of a stationary 
field. Consider a recent result of Perrin and Schlather (2007), which states that 



a Gaussian random vector can always be interpreted as a realization of a station- 
ary field in lZ p ,p > 2, subject to moment constraints on the vector. From this it 
is straightforward to state that, similarly, a realization of a Gaussian process in 
TZ d may be interpreted as a realization of a stationary field in lZ d+p ,p > 2 (simi- 
larly, subject to moment constraints) , with the covariance function ignoring the 
initial d dimensions. 

The above results show the existence of higher-dimensional stationary repre- 
sentations for nonstationary fields, yet in the vast majority of situations neither 
a nonstationary variogram, nor an analytic mapping to higher dimensions, is 
known. Here we construct a framework for using higher-dimensional represen- 
tations to model nonstationary systems, with the goal of learning the latent 
dimensions nonparametrically from information contained within the data. 

To learn the expanded, or latent, dimensions Zi, . . . , z a we propose 



(1) 



i<j 



where u*- estimates the spatial dispersion between sites i and j, for example 



i J2 \Y( Xi ) - Y(x 



2 

on ' 



with r > 1 indexing multiple observations of the system, the handling of which is 
considered in the discussion, and di,j([X, Z]) is the i,j th element of the distance 
matrix of the (augmented) locations [X, Z\. Once the matrix Z £ 1Z S x 1Z P is 
found, a function / is built such that f(X) w Z. While a wide range of options 
exist, we follow Sampson and Guttorp| (1992) in using thin plate splines, here 
one for each dimension of Z . The smoothing parameter of the thin plate spline 
(denoted A2) is used to control the smoothness of the resulting warped space 
through penalization of the bending energy 
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dx\dx2i (for d = 2), 
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and is therefore analogous to Xiw, the thin plate spline parameter in the image 
warping framework. Setting A2 = results in an interpolating spline, whereas 
A-2 — > 00 results in the linear least squares fit. The nonlinear functions / are 
therefore linear combinations of basis functions centered at the locations S E lZ d . 
Once a model is built in the expanded space, / _1 will bring us from the manifold 
in K d+P defined by (X, f(X)),X G K d back to the original space. 

Due to our unique formulation, we have f~ 1 (Z) = X, and we need not be 
concerned with the difficulties associated with ensuring that / is bijective as 
in earlier approaches. Thus we may view the originally observed locations X 
as a projection from a manifold within a higher dimensional space, [X, Z], in 
which the process is stationary. As an obvious (and direct) example, a process 
which is stationary given both geographical location and elevation may result 
in a nonstationary field given only longitude and latitude. Learning the latent 
dimensions (whether or not they have a physical meaning, such as elevation) 
means that a stationary model may be used in the expanded space. 

In many situations, it is unclear how many additional dimensions are needed 
to accurately model the spatial field. One could use cross-validation or a model 
selection technique to determine the dimensionality of Z; however, recognizing 
that might result in overfitting the spatial dispersions, we would also like to 
regularize the estimation of Z. As a result, we modify Q by including a group 



lasso penalty term on Z, where the groups are the dimensions of Z (Yuan and 



Lin (2006)). The resulting objective function is 



$,Z = argmin^^. - ^(d itj ([X, Z']))f + Ai £ (2) 

i<j k=i 

where Z' k is the fc-th column (dimension) of Z' . As a consequence of this revised 
objection function, one need only determine a maximum number of dimensions 
p and the parameter Ai, whereupon the learned augmented dimensions Z will 
be both regularized towards zero and sparse in dimension. Hence Ai can be 
viewed as regularizing the estimation of Z and determining the dimension of 
the problem, whereas A2 controls the smoothness of the augmented dimensions; 
we suggest learning both through cross-validation, although other model fit 
diagnostics or prior information may be used as well. 

It is relatively straightforward to solve ^ using the gradient projection 



method of Kim et al. (2006), which conducts block- wise updates for group lasso 
with general loss functions. Here the blocks are the dimensions of Z, and 
hence the optimization is efficient even for a large number of spatial locations. 
Optimization details are given in the appendix. For ease of exposition we use 
an exponential variogram, 

7^(xi,x a ) = 0i (1 - exp(-||xi - X2II/02)) + 03, 

which works well in the examples that follow, although the method applies 
analogously to other variograms. 
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Variogram (with Latent Locations) 



Variogram (2D Observations) 




Figure 1: Empirical variograms from the original process (left) as well as a 2- 
D projection (right) on the illustrative ellipsoid example. A fitted exponential 
variogram is shown by the solid line. 



2.1 Illustrative example 

We now present an illustrative example to help explain the concepts behind this 
proposed dimension expansion approach, as well as demonstrate the inability of 
image warping to handle complex nonstationarity. Specifically, we simulate a 
Gaussian process with s — 100 locations on a 3-dimensional ellipsoid centered 
at (0, 0, 0) such that the projection to the first 2 dimensions is a disk centered at 
the origin. Figure[l]plots the empirical variograms for the original 3-dimensional 
space as well as the 2-dimensional projection, the latter of which results in a 
highly noisy empirical variogram cloud. Our goal is to recover the lost dimension 
through dimension expansion by optimizing ^ with Xi — 0.5, chosen to induce 
Z to have one dimension. Here we calculate the matrix of empirical dispersions 
v*j using 1000 realizations of the Gaussian process. Figure [2] shows the resulting 
learned locations as well as the corresponding empirical variogram, where we see 
that dimension expansion is capable of recovering the lost dimension, resulting 
in a variogram that closely reproduces the original. 



2.2 Image Warping and Folding 



In the image warping approach, Sampson and Guttorp ( 1992 1 employ non- 



metric multidimensional scaling to move the locations along the geographic 
space, followed by fitting of the variogram 7^ using traditional variogram fit- 
ting methods. From this, a function / is found to go from the original to the 
warped locations, and back via / . A number of adaptations of this approach 
have been proposed. Smith ( 1996 ) proposed modeling the covariance function 



as a linear combination of radial basis functions using maximum likelihood (as 
suggested by Mardia and Goodall (1993)). Monestiez and Switzer (1991) and 
Monestiez et al. (1993) noted that the multi-stage algorithm of Sampson and 
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Learned Locations (Z) 



Variogram (with Learned Locations) 




Figure 2: Learned latent locations (left) using A2 = 10~ 4 as well as correspond- 
ing empirical variogram (right) after dimension expansion is applied. The fitted 
exponential variogram is shown by the solid line. 



Guttorp does not correspond to a unified optimization problem and instead 
propose finding the locations and fitting the variogram using a single objective 
func tion, an approach also pursued by Meiring et al. (1997). ft is worth noting 
that Monestiez and Switzer ( 1991 ) also explore mappings from 1Z 2 to 1Z 3 in 
the context of analyzing acid rain data, as the same-dimension mapping was 
incapable of describing the nonstationarity arising in the observed field. In a 
similar vein, Iovleff and Perrin (2004) propose using simulated annealing to fit 
the spatial deformation model. Rather than imposing smoothness on the defor- 
mation through thin plate splines, they use Delauney triangulation to constrain 
the mapping / from folding on itself. In order to acknowledge the uncertainty 
associated with the deformed locations, Damian et al.| (| 2001[ ) 



Schmidt and 



O'Hagan (2003), and recently Schmidt et al. (2011 1 have proposed Bayesian im- 



plementations of this approach, the latter additionally using observed covariate 
information to warp into higher dimensions. 

As described in the introduction, the image warping framework can suffer 
from problems of folding, namely of / not being bijective (See Zidek et al. ( 2000 1 
for a particularly extreme example of folding). Considering the illustrative ex- 
ample of Section 2.1, admittedly designed to be illustrative of such folding, we 
apply the image warping technique (Sampson and Guttorp ( 1992 1) with / mod- 
eled as a thin plate spline. Because the image warping framework contains no 
term similar to Ai to regularize the warped locations, smoothing must be done 
through the thin plate spline parameter \iw (analogous to A2 in the proposed 
dimension expansion framework). Figure[3]shows the warped grids and resulting 
empirical variograms for various settings of \iw applied to the ellipsoid example 
introduced in Section 2.1. We observe immediately that for a highly penalized 
/ (corresponding to large Xiw) the space does not fold; however, the variogram 
fit is very poor. As Xjw is relaxed to improve the fit, the space begins to fold, 
highlighting a potentially serious problem with the image warping framework 



7 



Warped Grid: k m - 1 
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Figure 3: Warped grid of locations (top) and corresponding variograms (bottom) 
for various settings of the thin plate spline parameter Xiw using the image 



warping technique of Sampson and Guttorp (1992). 



- an issue which is addressed in the dimension expansion paradigm proposed 
here. 

Also related to the proposed dimension expansion method are latent space 
models such as that proposed by |Hoff et al.|p002[ ). Here, latent dimensions are 
used to help learn a network of relationships between individuals. Recent work 
in the field of spatial statistics has also exploited latent dimensions to ensure 
valid cross-covariance functions in multivariate fields. Specifically. Apanasovich 



and Genton (2010) use latent dimensions for the different variables in order to 



build a class of valid cross-covariance functions. 



3 Applications 



We now present two applications of dimension expansion applied to the modeling 
of nonstationary processes using two real data sets. The first uses the solar 



radiation data (Hay (1984)) studied in the original image warping paper of 
Sampson and Guttorp (1992). The second consists of measurements from a 



network of air pollution (black smoke) monitoring sites in the UK, further details 
of which can be found in Elliott et al. (2007). 



3.1 Solar radiation 

The data of Hay ( 19841) includes measurements of solar radiation from 12 sta- 



tions in the area surrounding Vancouver. Due to the location and elevation 
of station 1 (Grouse mountain), the field is inherently nonstationary, as exhib- 
ited by the sample variogram (Figure [4} . This figure shows the original and 
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Figure 4: Original locations and empirical variogram for the solar radiation 
data (left); warped locations and associated empirical variogram using image 
warping with Xiw = 0.1 (centre); learned locations with associated empirical 
variogram using dimension expansion with Ai = 0.5, A2 = 10~ 4 (right). The 
units for the semi- variance are (MJm _2 day -1 ) 2 , and for distance are km (UTM 
coordinates, divided by 1000). The fitted variogram is shown by a solid line, 
and points associated with Grouse mountain (station 1) are highlighted using 
an "X". 



warped locations using Sampson and Guttorp's image warping approach with 
corresponding variogram plot. Image warping moves the locations (in partic- 
ular the station at Grouse mountain, which is the northernmost location) to 
achieve something closer to stationarity. It is worth noting that ovcrfitting may 
be controlled through the parameter Xiw of the thin plate spline. 

Figures [4] and [5] show the results of applying the dimension expansion ap- 
proach using Ai = 0.5 and Ai = 0.2, respectively, using a maximum number 
of dimensions of p = 5. The original locations are shown, as well as the added 
dimensions (Z). With Ai = 0.5 (Figure|4j right), dimension expansion adds one 
additional dimension which primarily serves to push Grouse mountain out of 
the plane, reflecting the a priori suggestion that it is elevation that leads to the 
station's spurious correlation pattern. Interestingly, the contours of the learned 
dimension closely resemble the elevation contours of the mountains surrounding 
the Vancouver area. With Ai = 0.2 (Figure[5]), 2 extra dimensions are used, and 
the fit of the parametric variogram improves marginally. We can therefore see 
how Ai influences the number of extra dimensions used, as well as the shrinkage 
in each dimension, in order to control the level of fit. 
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Learned Locations (Zi): Xi - 0.2 Learned Locations {Z z ): K\ - 0.2 Variogram 




Figure 5: Dimension expansion of the solar radiation surface using Ai = 
0.2, A2 = 10~ 4 . Z here is 5 dimensions, with Z 3 , Z4, and Z 5 set to zero as a result 
of the sparsity-inducing penalization. The first two panels show the learned loca- 
tions, and the right-most panel shows the associated empirical variogram (fitted 
variogram shown in red) . The units for the semi- variance are 
and for distance are km. Points associated with Grouse mountain (station 1) 
highlighted using an "X" . 



3.2 Air pollution 

The data consists of annual average concentrations of black smoke (/Ltgm -3 ) over 
a period of sixteen years from 77 locations within the UK operating between 
April 1978 and March 1993 (inclusive) and was obtained from the Great Britain 
air quality archive (www.airquality.co.uk). Sites were selected in areas defined 
wholly or partially residential and measurements were aggregated to ward level 



(based on the 1991 census) using a geographical information system (Elliott 
et al. ( 2007| )). The majority of wards contained a single site, but where there 



were more than one, records were either joined together if the time periods did 
not overlap or averaged if time periods of operation were simultaneous. Due 
to similarities in levels of air pollution in urban locations, even if they are not 
geographically close, the field is known to be nonstationary. Specifically, we 
see in Figure [8] reduced empirical dispersions for distances around 280 — 290km 
(the distance between London and Liverpool/Manchester), indicating that these 
urban centers are more similar than their distances would suggest. Our goal is 
to uncover and explore this nonstationarity through the dimension expansion 
framework. 

We begin with cross-validation to learn the optimal setting of the parameters 
Ai,A2 using ([2]) as described in Section 2. Figure [6] shows the resulting cross- 
validation RMSE for various parameter settings. We can see that moderate 
values of both Ai and A2 result in the best prediction performance. As Ai 
increases to its highest value (10 4 5 ), no dimension expansion occurs, and hence 
A2 has no impact. From this it is straightforward to see that the use of the 
original geographic space is a special case of the dimension expansion framework. 

Using these parameter values, the dimensionally sparse optimization ^ used 
by dimension expansion leaves all but one dimension of Z set to zero. This 
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CV Error from UK Black Smoke Data 




log«(X,) 



Figure 6: Leave- 10-out cross- validated prediction error of dimension warping ap- 
plied to the UK black smoke data. Here we see optimal prediction for moderate 
values of both Ai and A2. 



dimension is shown in Figure [7J where we see a strong ridge connecting London, 
Birmingham, Liverpool, and Manchester. Hence in the extra dimension major 
cities are moved closer together while rural areas are pushed further away. The 
variograms before and after the dimension expansion are shown in Figure [8j 
where we see no indications of nonstationarity after dimension expansion is 
applied. 



4 Discussion 



By augmenting the dimensionality of the underlying geographic space, we have 
developed a highly flexible approach for handling the nonstationarity that often 
arises in environmental systems. While ostensibly similar to image warping, 
the proposed method avoids the issue of folding and allows one to model much 
more complex nonstationarity patterns through interdimensional expansions, 
allowing the user to perform nonparametric learning of the mapping function. 
In addition, through the use of a group lasso penalty, we are able to estimate 
the number of augmented dimensions, as well as regularize the optimization 
problem. Lastly, we have highlighted the practical application of the dimension 
expansion approach in three examples, two of which use data from observed 
environmental processes. It is worth noting that while we have developed the 
spatial model in terms of variograms, it could alternatively be expressed in 
terms of covariances; see, for example, Gneiting et al. (2001) for a thorough 
comparison of the two approaches. 

In general, models will comprise a spatial mean or 'trend' term together with 
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Extra-dimensional mapping (Z) 




2e+05 3e+05 4e+05 5a+05 

Ea slings 



Figure 7: Map of the learned dimension following dimension expansion, which 
has found a strong ridge connecting the major cities, indicating closer correlation 
between these locations than would be suggested in geographic space. 



Variogram (Observed Locations) Variogram (Learned Locations) 




Distance Distance 



Figure 8: Binned empirical and fitted (solid line) variograms on the UK black 
smoke data, following dimension expansion. In the original geographic space, 
we see a dip in the empirical variogram at roughly 280/cto, corresponding to the 
distance between London and Liverpool/Manchester. After dimension expan- 
sion is applied, the ridge between London and Liverpool/Manchester removes 
this effect of nonstationarity. The units for the semi-variance are (^gm -3 ) 2 and 
for distance are km. 
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spatial covariance for deviations from this trend. It is desirable to maximally 
reflect the variation in the response using the mean function and thus known 
covariates, but inevitably the mean function will not be able to capture all 
of the spatial variation and thus residual spatial variation must be modeled 
specifically. When all relevant covariates are included in the mean term, it is 
commonly assumed that the resulting spatial term is stationary. However, as the 
Karhunen-Loeve expansion shows, the modeling of spatial trend and covariance 
are inseparable and misspecification of the former will induce a second order 
distortion in the latter, thus violating any assumptions of stationarity in many 
cases. Due to the complexity of environmentally processes, mis-specification 
is inevitable because all relevant covariates can never be known or, even if 
known, observed. In the air pollution example presented here, concentrations 
in cities appeared to be more similar than that of rural areas irrespective of 
their geographic proximity. If available, it would be possible to incorporate a 
measure of rurality in the mean function, possibly produced using a geographical 
information system based on population density data. However, even if such 
information were available, stationarity would still not be guaranteed and so 
there is a need for methods such as that proposed here to allow nonstationary 
models to be constructed for the spatial process. 

A Bayesian image warping approach which allows covariate effects to be 
included in the correlation structure has recently been suggested by |Schmidt| 



et al. (2011). By treating covariates as analogous to geographic coordinates, 



they warp the combined location-covariate space into a deformed space of the 
same dimension. To achieve computational efficiency, they consider a special 
case which restricts the form of the possible mapping function and assumes the 
spatial process to be a 2D manifold with covariates treated as separate values 
at each location. 

In practice, environmental data will often take the form of a number of mea- 
surements made over time at each location rather than true spatial replications 
per se. In order to try and isolate the purely spatial part of the process, the mean 
function may incorporate a temporal component into the mean function, mod- 
elling underlying temporal patterns and allowing the possibility of time- varying 
covariates, or even space-time interactions. In the absence of such covariate 
information, it would be possible to consider the notion of time- varying non- 
stationarity structure. For instance, if one wants to study the changing impact 
of cities and industrial areas on air pollution levels, examining changes in sta- 
tionarity over time would be a valuable way to understand these changes. The 
dimension expansion framework is also amenable to multivariate extensions. We 
are currently exploring a scenario whereby the dimension expansion functions 
and locations have a hierarchical structure, allowing the dimension expansion 
to vary for different variables, yet be tied together through the hierarchy. 

We have demonstrated how the proposed approach can be used to perform 
predictions in the transformed, stationary space and mapped back to the orig- 
inal space. At present the choice of the mapping, learning of latent locations, 
and prediction are performed in isolation. As the Sampson and Guttorp (1992) 
approach was set within a Bayesian framework by Damian et al. ( 2001[ ) and 
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Schmidt and O'Hagan (2003), setting the current algorithmic approach within 
such an inferential framework would allow the inherent uncertainty to be accu- 
rately reflected in resulting inferences and this is the goal of current research. 
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A Optimization of Equation d2 



As with traditional multi-dimensional scaling, penalization functions of the form 
([I]) do not have a unique maximum. However, the learned locations are unique 
up to rotation, scaling, and sign. The optimization problem ([2| is more regular- 
ized, however, due to the presence of the ^-norm. Specifically, not all rotations 
and scalings of the learned locations will have the same ^i-norm, and hence the 
resulting optimization is unique only up to sign and indices of zero/non-zero 
dimensions. 

In our experience, traditional optimization procedures such as Nelder-Mead 
or the Broyden-Fletcher-Goldfarb-Shanno method (Nocedal and Wright (1999)) 



work well for a moderate amount of locations (s < 100) and dimensions (p < 3). 
For larger problems, it may be necessary to use purpose-built optimization rou- 
tines intended for generalized group lasso. Let Q(U) be the first term in equation 
[2j where U — [X, Z]. Then column k of the gradient matrix is 

2 

Vkfi(t/) = -T o (U.^lpxp - lpxpU.,k) ipxi 

where 

ry = (7*(«-^)^ 

Using this gradient information, the gradient projection algorithm of |Kim et al.| 
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(2006) may be used to optimize The algorithm proceeds as follows: 



Initialize : U° = 0, a : sufficiently small positive constant 
for t = 1,...,T do 

Set u = U*- 1 - aVniU*- 1 ) and r) = {1, . . . ,p} 
while Afj > Vj do 
Forj = l,...,p 



M j =I(jev)x IKII + 



Set r? = {j : > 0} 



end 

Set 17* 



X J||« 3 || 



for j = 1, 



end 

Return C7 T 

Further algorithmic details, such as the tuning of M and the setting of the 
algorithmic parameter a, can be found in Kim et al. (2006). 
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